Few-Shot Speaker Identification Using Lightweight Prototypical Network With Feature Grouping and Interaction

نویسندگان

چکیده

Existing methods for few-shot speaker identification (FSSI) obtain high accuracy, but their computational complexities and model sizes need to be reduced lightweight applications. In this work, we propose a FSSI method using prototypical network with the final goal implement on intelligent terminals limited resources, such as smart watches speakers. proposed network, an embedding module is designed perform feature grouping reducing memory requirement complexity, interaction enhancing representational ability of learned embedding. module, audio each speech sample split into several low-dimensional subsets that are transformed by recurrent convolutional block in parallel. Then, operations averaging, addition, concatenation, element-wise summation statistics pooling sequentially executed learn sample. The consists bidirectional long short-term memory, de-redundancy convolution which conducted too. Our compared baseline three datasets selected from public corpora (VoxCeleb1, VoxCeleb2, LibriSpeech). results show our obtains higher accuracy under conditions, has advantages over all complexity size.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prototypical Networks for Few-shot Learning

A recent approach to few-shot classification called matching networks has demonstrated the benefits of coupling metric learning with a training procedure that mimics test. This approach relies on an attention scheme that forms a distribution over all points in the support set, scaling poorly with its size. We propose a more streamlined approach, prototypical networks, that learns a metric space...

متن کامل

Semi-Supervised Few-Shot Learning with Prototypical Networks

We consider the problem of semi-supervised few-shot classification (when the few labeled samples are accompanied with unlabeled data) and show how to adapt the Prototypical Networks [10] to this problem. We first show that using larger and better regularized prototypical networks can improve the classification accuracy. We then show further improvements by making use of unlabeled data.

متن کامل

Gaussian Prototypical Networks for Few-Shot Learning on Omniglot

We propose a novel architecture for k-shot classification on the Omniglot dataset. Building on prototypical networks, we extend their architecture to what we call Gaussian prototypical networks. Prototypical networks learn a map between images and embedding vectors, and use their clustering for classification. In our model, a part of the encoder output is interpreted as a confidence region esti...

متن کامل

Feature grouping from spatially constrained multiplicative interaction

We present a feature learning model that learns to encode relationships between images. The model is defined as a Gated Boltzmann Machine, which is constrained such that hidden units that are nearby in space can gate each other’s connections. We show how frequency/orientation “columns” as well as topographic filter maps follow naturally from training the model on image pairs. The model also off...

متن کامل

Dynamic Input Structure and Network Assembly for Few-Shot Learning

The ability to learn from a small number of examples has been a difficult problem in machine learning since its inception. While methods have succeeded with large amounts of training data, research has been underway in how to accomplish similar performance with fewer examples, known as one-shot or more generally few-shot learning. This technique has been shown to have promising performance, but...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Multimedia

سال: 2023

ISSN: ['1520-9210', '1941-0077']

DOI: https://doi.org/10.1109/tmm.2023.3253301